MemGator - A Portable Concurrent Memento Aggregator

نویسندگان

  • Sawood Alam
  • Michael L. Nelson
چکیده

With the growth in the number of public web archives it is becoming important to provide a means to aggregate them for better coverage and completeness. The Memento protocol [2] provides a uniform API to lookup URIs in web archives. Due to the wide support of the Memento protocol in the archiving ecosystem it is now easy to aggregate their holdings for any URI lookup. However, current applications can either use their custom aggregator implementation or rely on centralized services such as LANL’s Time Travel portal and ODU Memento Aggregator. While centralized third party services are serving their purpose well, the convenience has the trade-off of lack of customization and control such as the client application cannot choose which archives to be aggregated. Centralized services are usually good for general usages, but not suitable for specialized purposes such as research or heavy traffic applications. For example, certain archives have IP-based traffic throttling policies which might limit the ability of the centralized server in case of heavy traffic. Similarly, the recent surge of OldWeb.today caused increased load on archives, as a result one archive requested its exclusion from being polled. This would have been an issue if they were using a centralized service. There are a few open source aggregator implementations such as Memento Server and Memento Java Client Library, but they are either outdated or require a server setup. With these issues in mind, we created MemGator that provides a standalone cross-platform binary without any external dependencies. It can be used as a one-off command to retrieve the response on the standard output or run as a web service to replicate necessary features of the centralized Memento aggregator services. We tried to keep the service API as close as possible to the LANL’s Time Travel service for greater interoperability. Both the modes (CLI and server) come with a handful of customization options that are documented in the binary itself and can be seen using standard help flag. One such configuration option is to supply a custom list of archives to be aggregated or use the archive profile based archive ranking to query top-K archives only. The tool is currently being used heavily in OldWeb.today, WAIL, Mink, and our internal archiving research projects. It has proved to be reliable even in the extreme load conditions. An aggregator is a good example of a concurrent application. It relies on various upstream archives which consumes the maximum amount of the overall time in network I/O while the process sits idle. Performing this operation sequentially will make it useless as the number of upstream

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhanced memento's aggregator framework to browse the past web

Browsing the past Web in an easy, a complete, and consistent way has become an essential need in the recent years. The archived contents are distributed among many systems on different locations, and each one has its own format, protocol, technology for preserving and retrieval. In this research, we propose the “Enhanced Memento Aggregator” framework which is capable of collecting, filtering, r...

متن کامل

Memento A Collaborative Semantic Based Infrastructure for Building Assistant Applications

Memento is a software infrastructure to support the construction and evolution of assistant applications or assistants that act as adjuncts to the human mind Each assistant embodies an e ective understanding of some information domain or problem domain The assistant employs this understanding to aid a user or user community in the manipulation transmission and storage of meaningful information ...

متن کامل

A Framework for Evaluation of Composite Memento Temporal Coherence

Most archived HTML pages embed other web resources, such as images and stylesheets. Playback of the archived web pages typically provides only the capture date (or Memento-Datetime) of the root resource and not the Memento-Datetime of the embedded resources. In the course of our research, we have discovered that the Memento-Datetime of embedded resources can be up to several years in the future...

متن کامل

Memento: A Framework for Hardening Web Applications

We propose a generic framework called Memento for systematically hardening web applications. Memento models a web application’s behavior using a deterministic finite automata (DFA), where each server-side script is a state, and state transitions are triggered by HTTP requests. We use this DFA to defend against cross-site request forgery (CSRF) and crosssite-scripting (XSS) attacks. The client w...

متن کامل

Adding Time to Linked Data: A Generic Memento proxy through PROV

Linked Data resources change rapidly over time, making a valid consistent state difficult. As a solution, the Memento framework offers content negotiation in the datetime dimension. However, due to a lack of formally described versioning, every server needs a costly custom implementation. In this poster paper, we exploit published provenance of Linked Data resources to implement a generic Memen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • TCDL Bulletin

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2017